Multiple Table Lookup Implementation of Error Correction on an FPGA

نویسندگان

  • Amila Akagić
  • Hideharu Amano
چکیده

ASTRACT In recent studies substantial number of research papers explored possibility of accelerating CRC codes as a way of gaining significant performance increase while processing a protocol. Most software implementation depend on a speed and architectural improvements of general purpose processors. It has been reported that a single CPU core still can’t achieve 10Gbps rates when receiving and processing bulk data (>2KB) [1]. Widely used Sarwate algorithm is being replaced by a newly proposed Slicing-by-8 algorithm, with highest reported throughput of 3.6Gbps on a 1.7GHz Pentium M processor [2]. CRC algorithms are highly sequential, thus it’s not known how they will take advantage of multicore processors. We believe that in order to achieve higher throughput (ex. 10Gbps or 40Gbps), new technologies have to be considered. Based on a [2] framework, we design and implement Slicing-by-{4, 8, 16, 32} algorithms, on Xilinx Virtex 5 LX30 device (5vlx30ff676), at speed grade of -3. Algorithms read and process 32, 64, 128 and 256 of input data at a time, respectively. They are based on two principles of modulo2 arithmetic, bit slicing and bit replacement, so they use different number of lookup tables with precomputed CRC values. Block diagram of Slicing-by-4 is displayed on Fig. 1. Other implementations differ only by a way we access memory and by a number of lookup tables used. We present results in the Table I. Maximum throughput attained is 79.86Gbps while processing 256 bit at a time. Other implementations achieve following maximum throughput: 12.66Gbps for processing 32 bits, 24.93Gbps for processing 64 bits and 41.99Gbps for processing 128 bits at a time. Input buffer for Slicing-by-{4, 8, 16} is implemented to read 128 bits and Slicing-by-32 reads 256 bits from a bus. Slicingby-4 algorithm reads 32 bits at a time, so our implementation uses a mux to make a selection of four possible words from the input buffer (Fig. 1). This value is stored in a register that is used as the first operand for the first XOR circuit. Second operand depends on a current iteration step. Initial value, defined by a CRC standard, is used only in the first iteration, and other iterations use previously calculated CRC value. Output from the second XOR circuit is then sliced into a number of 8 bit slices, used to access tables (T1-4 on Fig. 1). Lookup tables are implemented as 256x32-bit ROM modules. Outputs from these tables are XORed and the results are saved into another register. This value is used in the next iteration. In final stage, last CRC value is XORed with a final value, Fig. 1. Block diagram of our design for Slicing-by-4 algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FPGA Implementation of a Sub-pixel Correction Algorithm for Active Laser Range Finders

This paper presents a FPGA implementation of a Sub-pixel correction algorithm for active laser range finders. It shows how to replace complex CPU operations by an efficient use of arithmetic functional units and lookup tables LUTs. This leads to a less complex architecture and an increase in performance. The architecture of a processor element, its complexity and performance on a Xilinx FPGA de...

متن کامل

Implementation of CRC and Viterbi algorithm on FPGA

Cyclic Redundancy Codes (CRC) code provides a simple, yet powerful, method for the detection of errors during digital data transmission and storage. Convolutional Coding and Decoding (CODEC) is a Forward Error Correction (FEC) technique that is particularly suited for a channel in which the transmitted signal is corrupted mainly by Additive White Gaussian Noise (AWGN). The Viterbi Algorithm (VA...

متن کامل

Address Generation Circuitry of WiMAX Deinterleaver using Xilinx FPGA

The paper reports the implementation of address generator of the 2-D deinterleaver used in wimax transreceiver system using FPGA. The bit streams in channel interleaver and deinterleaver for IEEE 802.16e standard associated with floor function is very difficult to implement on the FPGA kit. But by using this proposed algorithm eliminates the requirement of floor function and it also reduces the...

متن کامل

An Arbiter PUF Secured by Remote Random Reconfigurations of an FPGA

We present a practical and highly secure method for the authentication of chips based on a new concept for implementing strong Physical Unclonable Function (PUF) on field programmable gate arrays (FPGA). Its qualitatively novel feature is a remote reconfiguration in which the delay stages of the PUF are arranged to a random pattern within a subset of the FPGA’s gates. Before the reconfiguration...

متن کامل

An Efficient LUT Design on FPGA for Memory-Based Multiplication

An efficient Lookup Table (LUT) design for memory-based multiplier is proposed.  This multiplier can be preferred in DSP computation where one of the inputs, which is filter coefficient to the multiplier, is fixed. In this design, all possible product terms of input multiplicand with the fixed coefficient are stored directly in memory. In contrast to an earlier proposition Odd Multiple Storage ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010